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Abstract. In this work we focus on the search and detection of Massive black 
hole binary (MBHB) systems, including systems at high redshift. As well as ex- 
panding on previous works where we used a variant of Markov Chain Monte Carlo 
(MCMC), called Metropolis-Hastings Monte Carlo, with simulated annealing, we 
introduce a new search method based on frequency annealing which leads to a 
more rapid and robust detection. We compare the two search methods on sys- 
tems where we do and do not see the merger of the black holes. In the non-merger 
case, we also examine the posterior distribution exploration using a 7-D MCMC 
algorithm. We demonstrate that this method is effective in dealing with the high 
correlations between parameters, has a higher acceptance rate than previously 
proposed methods and produces posterior distribution functions that are close to 
the prediction from the Fisher Information matrix. Finally, after carrying out 
searches where there is only one binary in the data stream, we examine the case 
where two black hole binaries are present in the same data stream. We demon- 
strate that our search algorithm can accurately recover both binaries, and more 
importantly showing that we can safely extract the MBHB sources without con- 
taminating the rest of the data stream. 



PACS numbers : 04.30.-w, 04.30.Db, 04.80.Cc 
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1. Introduction 

1.1. Background. 

Massive black hole binaries (MBHB) are expected to be one of the strongest candidate 
sources for LISA, the Laser Interferometer Space Antenna, a joint ESA-NASA 
mission that will search for gravitational waves (GW) is the frequency bandwidth 
10~ 5 < f /Hz < 1 pp. There is growing observational evidence for the existence of 
Massive black holes (MBH) at the center of most galaxies. Galactic formation models, 
such as Hierarchical structure formation, suggest that modern day MBHs are the direct 
result of the coalescence of smaller "seed" black holes of ~ 1O 5 M throughout cosmic 
history [2] . Observations also suggest direct evidence of a MBHB in the radio galaxy 
0402+379 3J, at the center of the Quasar 3C 345 [4]. While there are other possible 
detections, it is the system 0402+379 that seems to give the most compelling evidence 
of a MBHB. It is suggested that at the center of this system there are two MBHs with 
a combined total mass of ~ 10 8 M Q and an orbital separation of 7.3 pc. 

The detection of MBHBs by LISA is important for two main reasons. Firstly, 
it will allow us to carry out a test of gravity in the highly nonlinear strong-field 
regime El [7]. Secondly, it will allow us, in conjunction with other astronomical 
methods, to investigate such things as galaxy interactions and mergers out to very 
high redshift (z > 10). These systems are also very attractive sources for LISA. 
Due to their high masses, even as these systems approach merger and ringdown, they 
occur at frequencies which are blocked to the ground based detectors because of such 
things as tectonic motion below about 10 Hz and more importantly, time varying 
gravitational potentials such as weather systems etc. Also, unlike galactic binaries 
and Extreme Mass Ratio Inspirals (EMRIs), the MBHBs are very clean sources with 
measureable signal to noise ratios (SNR) of order ~ 100 — 1000's. While the actual 
event rate for MBHBs is still a subject of contention [3 [9], it is unlikely to be high 
enough to lead to confusion noise between sources, something which is very important 
in the search for galactic binaries and to a lesser extent in the search for EMRIs. 

While a lot of work has been done on source modelling, especially for EMRI 
sources, there has not been much research done into developing viable detection 
algorithms for the LISA data analysis problem. One option would be to adopt the 
current ground based inspiral methods and use a template bank to search for each 
source. However, due to the high cost of templates involved for a blind template 
bank searching for a singe source (10 7 for galactic binaries [10], 10 13 for non-spinning 
MBHBs p3] and 10 40 for EMRIs [12]) other methods are needed. While there are 
currently other methods being developed, such as genetic algorithms [13], iterative 
methods [IH [15] , time-frequency methods [16] and tomographic reconstruction [17] , 
we present here a variant of the Markov Chain Monte Carlo method (MCMC) [HI H"9] 
called Metropolis-Hastings Monte Carlo (MHMC). The MCMC algorithm is a useful 
approach to searching through large parameter spaces, performing model comparisons, 
estimating instrument noise and providing error estimates. It has already been 
successfully been used in other astronomical fields such as cosmological parameter 
estimation with the WMAP data [20]. In terms of GWs, the MCMC technique 
has already been applied to ground-based analysis [211 Ell [23], a LISA toy-model 
problem pi [25] and galactic binaries E3[27]. For MBHBs both MCMC and MHMC 
techniques have been successfully applied [HI [28l [29] . 

In this work, we will focus on the detection of gravitational waves from inspiralling 
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Schwarzschild BHs in the Low Frequency Approximation. In the future we intend 
extending our work to include extra complications such as higher harmonic corrections 
to the amplitude, introducing spins and more complicated detector response functions 
to take into account lower mass systems. 

1.2. The LISA detector and response at the detector to GWs. 

LISA is composed of a constellation of three drag- free spacecraft forming an equilateral 
triangle with each side of length L — 5 x 10 6 km. The center of mass of the detector 
follows a heliocentric orbit 20° behind the Earth. The detector triangle is tilted by 
60° with respect to the ecliptic. Due to orbits of each individual spacecraft being on 
different planes, the detector triangle rotates about itself with a period of 1 year. The 
detector itself is an all-sky detector. The LISA configuration allows us to use the three 
detector arms as a pair of two arm detectors. The motion of the detector around the 
Sun introduces a periodic Doppler shift whose magnitude and phase are dependent on 
the angular position of the source in the sky. 

While the full response from the detector involves issues such as fluctuations 
of arm-length, pointing ahead and signal cancellation once the wavelength of the 
GW becomes comparable to the armlength of LISA, we can make an excellent 1st 
approximation called the Low Frequency Approximation (LFA) [30j . By assuming 
that there are no fluctuations in arm length, ignoring the problem of pointing ahead, 
ignoring signal cancellation due to the LISA transfer functions and assuming we can 
evaluate all three spacecraft at the same time, we can safely work in the LFA. Also, as 
long as the frequency of the MBH is less than the transfer frequency of the detector, 
i.e. / <C /* ~ I0~ 2 Hz, the LFA is a good approximation to the total response from 
the detector. In fact it was shown that the LFA is a good approximation to the full 
detector response to a frequency of ~ 3 mHz |3Ij In this approximation the beam 
pattern functions are essentially a quadrupole antennae. 

Working in the limit / <C /* and /// -C L, we can express the strain at the 
detector to an incoming GW with polarizations 7i+,x (t) as 

h(t) = h + (£(t))F + +h x (Z(t))F x , (1) 

where the phase shifted time parameter is 

£(*) =t-R® sin cos (a(t) - <j>) . (2) 

Here, R® = 1AU » 500 sees is the radial distance to the detector guiding center, (8, </>) 
are the position angles of the source in the sky, a(t) = 2tt f m t + k, f m = l/year is the 
LISA modulation frequency and n gives the initial ecliptic longitude of the guiding 
center. The beam pattern functions are defined by 

F+(t) = X - [cos(2^)D+(t; 6, 4>, A) - sin(2^)^ x (t; 6, cf>, A)] , (3) 

F x (t) - i [sin(2V0£>+(t; 0, 0, A) + cos(2^) J D x (t; 0, 0, A)] . (4) 

The quantity ip is the polarization angle of the wave. Formally, if L is the direction 
of the binary's orbital angular momentum and n is the direction from the observer 
to the source (such that the GWs propagate in the — n direction), then ip fixes the 
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orientation of the component of L perpendicular to n. The time dependent quantities 
£)+ >x (t) are given in the LFA by [31] 

/3 

D+(t) = — -36sin 2 (60sin(2a(t)-2A) + (3 + cos(260) (5) 
64 . 

( cos(20) | 9 sin(2A) - sin(4a(i) - 2A)| + sin(2</>) { cos (4a(t) - 2A) - 9 cos(2A) | ) 
-4V3sin(20) ( sin(3a(t) - 2A - 0) - 3sin(a(f) - 2A + 

L> x (t) = i [ V3 cos(6>) f 9 cos(2A - 20) - cos(4a(t) - 2A - 20)") (6) 
16 L V / 

-6 sin(0) ( cos(3a(i) - 2A - <f>) + 3 cos(a(i) - 2A + 0)) , 

where A = and tt/4 give the orientation of the two detectors. We can see from the 
above equations that we do not measure the polarizations individually at each detector, 
but a combination of the polarizations weighted by the beam pattern functions. 

1.3. Organization of the paper. 

The aim of this paper is to extend on previous works on the search for GWs from a 
MBHB pTJ[28l[29] using the LISA detector. In these previous works it was shown that 
it was possible to carry out not only a search for these sources, but also to map out 
the posterior distributions for the errors in the parameter estimation of the source. 

The organization of the rest of the paper is as follows : In Section [2] we define the 
gravitational waveform describing the inspiral phase of the two MBHs. We outline the 
division of the parameters into those extrinsic and intrinsic to the source. We then go 
on to define the restricted post-Newtonian (PN) approximation and finally state the 
post-Newtonian equations describing both the phase and frequency evolution of the 
waveform. 

In Section [3] we outline the theory behind estimating both the source parameters 
and the errors in such an estimation. We briefly outline the geometric nature of the 
problem, before going on to outline the main tools behind our analysis. This will 
entail a definition of the F-Statistic method and a discussion on the confusion noise 
from the galactic foreground. 

In Section [4] we present a discussion of Metropolis-Hastings sampling using 
simulated annealing. We will also present a new alternative method which we describe 
as frequency annealing. 

Section [5] contains a presentation of the results from our analysis as we search for 
single sources in the data stream. Because of the brightness of MBHBs, it is actually 
quite difficult to define an astrophysically reasonable low SNR source in the case where 
we observe the coalescence of the MBHs. While it not hard to have galactic binaries 
with an SNR of ~ 5, to define MBHBs with SNR of ~ 10 we have to assume that 
the time to coalescence is much greater than the time of observation. In a previous 
analysis [11], we have found that it is quite trivial to find sources with SNRs of > 100. 

It is because of this that we have chosen sources with relatively low SNR, covering 
various mass ranges from almost equal to a mass ratio of about 10, with distances 
up to z = 10 and with total mass ranges that show that it should be possible for 
LISA to see the coalescence of "seed" galaxies at cosmological distances. To make 
the problem more realistic, we also introduce a galactic foreground of approximately 
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26 million individually modelled galactic binary sources. This galactic foreground is 
generated using a modern population synthesis code [32] and a fast detector response 
code (33] [34] . At the end of this section we discuss the Markov Chain Monte Carlo 
method we use to analyze the posterior distributions. One of the important parts of 
this section is a discussion on how to compensate for the large correlations between 
the phase at coalescence and the other parameters when we explore the posteriors in 
the case where we do not see coalescence. 

Section contains an analysis of the exploration of the posterior distribution 
functions for each of the parameters. We concentrate on the case where we do not 
see the coalescence of the waveform as this is the more challenging situation. In the 
case where coalescence is seen, the phase at coalescence is sufficiently resolved that 
it minimizes the correlations between the parameters. We have addressed this issue 
in a previous publication [28]. When we do not see the coalescence of the waveform, 
the phase at coalescence is virtually undetermined. We show from looking at a slice 
through the Likelihood surface shows that the error ellipsoid is very long and narrowly 
peaked, like the blade of a knife. This is a problem for any MCMC exploration as 
most proposals will be rejected due to the fact that it is so difficult to stay on the 
knife edge. We propose a scheme where we maximize over the phase at coalescence 
and the luminosity distance and conduct our exploration in 7 dimensions only. 

In Section[7]we examine the search when there are two MBHBs present in the data 
stream. We demonstrate that because of the small overlap between two MBHB sources 
that we can pull out each of the MBHBs individually with virtually no contamination 
to the rest of the data. Our algorithm at present uses an iterative search for the 
MBHBs. 

2. The Gravitational Waveform 

The gravitational waveform from inspiralling bodies is composed of an inspiral, merger 
and ringdown phase. While we believe we have good knowledge of the inspiral and 
ringdown phases, the merger phase is less well understood, though there has been 
great recent progress [553. Due to the recent progress in Numerical Relativity, it 
is hoped that some time in the near future we will be able to model the entire 
waveform. For now, we focus only on the initial inspiral phase. Such an inspiral 
waveform is referred to as a chirp waveform due to the increase of amplitude and 
frequency caused by a slow decay in the orbit of the two component bodies. This 
decay is caused by the loss in energy and angular momentum from the system in the 
form of GWs. In this analysis, we focus on MBHBs where the component bodies are 
Schwarzschild MBHs. In this case the waveform is described by the nine parameter set 
x = {ln(M c ), ln(/i), 0, (/>, ln(i c ), i, ip c , ln(_Di), ip}, where M c is the chirp mass, /i is the 
reduced-mass, (6, (f>) are the sky location of the source, t c is the time-to-coalescence, i 
is the inclination of the orbit of the binary, ip c is the phase of the GW at coalescence, 
Dl is the luminosity distance and tp is the polarization of the GW. We will describe 
the parameter subset {-Dl, i, y c , V'} as being extrinsic parameters, while all the rest 
will be described as being intrinsic, i.e. parameters that describe the dynamics of the 
binary. We should also mention that (9, 4>), which would normally be classed as being 
extrinsic, are classed as being intrinsic due to the fact that they are tied to the motion 
of LISA through the beam pattern functions. 

The GWs are defined as a superposition of harmonics at multiples of the orbital 
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phase. The two harmonics of the wave can be written as 



(*) 



^ m 



(7) 



where m is the harmonic index, $ or f, is the orbital phase and are the amplitude 

corrections associated with each harmonic. The strongest harmonic of the wave is 
the one given by the quadrupole moment of the source, corresponding to m = 2. In 
the restricted post-Newtonian approximation we neglect all other harmonics besides 
the quadrupole term in the amplitude, but expand the corrections in the phase of the 
wave up to n-PN order, i.e. ~ (v/c) 2n . We should point out that this approximation 
neglects the phase modulations introduced by the other amplitude corrections. In a 
future publication we will examine the consequences of searching for a signal with 
the full amplitude corrections using the restricted PN waveforms. For this study, we 
include corrections to the phase up to 2-PN order. 

In the restricted post-Newtonian approximation, the GW polarizations are given 
by [35] 

h+ = 2 ^ T] (l + cos 2 i) x cos($), (8) 
c z Dl 



h x = - 



AGmrj 

^7 



cos l x sin(<I>). 



(9) 



Here m — mi + 777,2 is the total mass of the binary, 77 = mira2/m 2 is the reduced mass 
ratio, G is Newton's constant and c is the speed of light. The inclination of the orbit 
of the binary system is formally defined as cos t = L • ft. The invariant PN velocity 
parameter is defined by x = (Gtocj/c 3 ) 2 ^ 3 , where 



;(t) = 



e-3/8 + 



+ 



8Gm 
/ 1855099 
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7 r f 



e-r/s 



(10) 

is the 2 PN order orbital frequency for a circular orbit formally defined as uj = 
d&orb/dt, and $ = ip c — ip(t) = 2$ or ; ) is the gravitational wave phase which is defined 
as 



Q5/8 



n 

9275495 
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h V8064 
284875 
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The time dependent quantity 0(i; t c 
t c , by 



(11) 

is related to the time to coalescence of the wave, 



e(*;* c ) 



err] 
bGm 



{tc - t) 



(12) 



As they are used interchangeably in our analysis, the relationships between (m, rj) 
and (M c ,/z) are given by M c = m?y 3 / 5 and \i = mr\. In Table Q] we present the 
parameter values and the prior ranges for both MBHBs that we use in this analysis. 
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Case A 


Case B 


mi /Mr^ 
Ul\j iKIQ 


5 x 10 5 


1 x 10 6 


mo /Mm 


3 x 10 5 


2 x 10 5 


6/rad 


0.6842 


1.6398 


(j) 1 vad 


2.5791 


4.1226 


t„ /yr 


0.458333 


1.04 


L J TCld 


0.9273 


0.7647 


ip 1 vad 


1.4392 


1.373 


z 


10 


1.5 


D L /Gpc 


106.28 


11.01 


tpc/rad 


0.3291 


0.9849 


f™*{z)/mHz 


0.3965 


0.14584 


mf m /M Q 


5 x 10 6 


1 x 10 6 


mf ax /M Q 


5 x 10 7 


1 x 10 7 


(mi/m 2 ) m in 


5 


15 


{m 1 /m 2 ) ma x 


1 


5 


t™ in /yr 


0.25 


0.75 


t™ ax /yr 


0.75 


1.25 


SNR 


40.989 


97.39 



Table 1. The parameter values and priors for the two test cases studied. In 
the above table, both the maximum gravitational wave frequency and total mass 
priors quoted are redshifted values. 



We should point out that the individual masses quoted are the rest-frame masses, while 
the maximum GW frequency reached and the priors for the total mass are given in 
redshifted values. The priors were chosen to encompass an astrophysically interesting 
range, and also not tight enough as to make sure that the chain would definitely find 
the sources. 

While we initially define our sources in terms of redshift z, it is the luminosity 
distance Dl that enters into the waveform equations. Using the WMAP values 
of (Vl Rl tt M ,VL A ) = (4.9 x 10" 5 , 0.27, 0.73) and a Hubble's constant of H =71 
km/s/Mpc [20) . the relation between redshift, z, and luminosity distance, Dl is given 

by 

1 „ ' / dz' n R {i + z'f + n M (i + z'f + fi A . (13) 



Dl = IT 

For two Schwarzschild black holes, Equations (flTjf and (fTTj) governing the 
evolution of the frequency and phase break down even before we reach the last stable 
circular orbit (LSO) at R = 6M. Because of this, if we are examining a case where 
the source coalesces within the observation period, we terminate our waveforms at 
R = 7M. Due to the fact that we are using numerical Fourier transforms in our 
analysis, if we just truncate the time domain waveform, we get spectral leakage and 
intense ringing in the Fourier domain. It is also a requirement of the Fourier routines 
that we use that each waveform array is an integer power of 2. We therefore need to 
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truncate our waveform arrays smoothly and pad the rest of the array with zeros. In 
order to do this, we introduce the following hyperbolic taper function 

F(x,x ma x) = ^I 1 _ tanh(/c{x - x max })]. (14) 

with which we multiply the amplitude of the wave. In the case where we see 
coalescence, we set x max = 1/7. In the case where we do not see coalescence, we use 
the following methodology. Solving a non-linear equation in Q of the form g(9) = 1/7, 
we find @ m in{t), and from this calculate the maximum frequency, and hence the 
maximum velocity reached during the time of observation. We then set this value 
of x m ax in the taper, and once again pad the waveform arrays with zeros after this 
velocity has been reached. The steepness parameter k in the taper is chosen to be 
sufficiently large (> 100) that it cuts off the time domain waveform smoothly, while 
reducing the ringing in the frequency domain. 



3. Measuring and estimating parameter errors with LISA. 

3.1. Background theory. 

Most of the current methods of parameter measurement and error estimation are 
based on a geometric model of signal analysis [37l [38J, 139] . In this treatment, the 
waveforms are treated as vectors in a Hilbert space. This vector space has a natural 
scalar product 

1 /2 

and vector norm \h\ = (h\hy', where 

poo 

h(f)= / dth{t)e^ lft (16) 



is the Fourier transform of the time domain waveform h(t). The quantity S n (f) is the 
one-sided noise spectral density of the detector and will be defined at a later stage. 

As previously stated, we can think of the three arms of the LISA interferometer as 
being two separate 90° interferometers. In this case, the signal in each of the detectors 
is given by 

Si (t) = hi(t)+ni(t), (17) 

where i = 1,11 label each detector. We assume that the noise rii(t) is stationary, 
Gaussian, uncorrelated in each detector and characterized by the noise spectral density 
S n (f). Using the scalar product defined in Equation (fT5|) . we define the SNR in each 
detector as 

Pi = -£toL (18) 
Closely related to the SNR is the normalized overlap between two templates defined 

by 

Q = - (/tlM . (19) 

Now, given some signal s(i), the likelihood that the true parameter values arc given 
by some parameter vector x is described by 

C{x) = Ce~^ h ^ s - h ^/ 2 , (20) 
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r 



(22) 



where C is a normalization constant. To achieve the Maximum Likelihood, we search 
for a parameter set that minimizes the exponent in the above equation. For most 
MBHBs, the SNR will be high. In this high SNR limit, the errors in the parameter 
estimation will have a Gaussian probability distribution given by 

p(Af) = y|^ r - A ^ (21) 

where V ^ v is the Fisher information matrix (FIM) 
/ dh 

\ dx^ dx u 

and r = det (T^). When we use both LISA detectors the total SNR is given by 

P = P I + P 11 , (23) 
while the total FIM is given by 

r^ = r^ + r^. (24) 

Once we have the total FIM, we numerically invert to get the variance-covariancc 
matrix 

C^ = T-J. (25) 

The diagonal elements of C^ v give a ler 2 estimate of the error in the parameter 
estimation, i.e. 



Ax^ = VO^ 7 , (26) 
while the off-diagonal elements give the correlations between the various parameters 

- 1 < c"" < 1. (27) 



Theoretically, in the large SNR limit, C^ v is the Cramer-Rao bound. The problem is 
that it is very difficult so say what exactly constitutes the large SNR limit. For low 
SNRs, the assumption that the errors have a Gaussian probability distribution breaks 
down. 



3.2. The noise power spectral density. 

We can see from the above equations that the scalar products and calculation of the 
FIM are all weighted with the one-sided power spectral density S n {f). The noise 
spectral density in the detector can be written in terms of the autocorrelation of the 
noise 

(n(/)n*(/')> = ^S ff S n (f), (28) 

where T b s is the time period of observation and the angular brackets denote an 
expectation value. The total noise power spectral density in our analysis can be broken 
into two parts : instrumental and galactic. To model the instrumental noise we use 
the standard one-sided noise spectral density for the LISA detector given by [40] 

1 

4~L 2 



ST (/) = 772 



SP° s (f) + 2(1 + cos 2 5 « CC ( / ) 



(29) 



J* J J (2tt/) 4 . 

where L = 5 x 10 6 km is the arm-length for LISA, S* os (/) = 4 x IQ- 22 m 2 / H z and 
S® cc (f) — 9 x 10 -30 m 2 /s A /Hz are the position and acceleration noise respectively. 
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f(Hi) 



Figure 1. A plot of the power spectra of our two test sources. The spectra of the 
signals is plotted against the combined instrument noise and galactic foreground. 
The solid line running through the noise is the rms noise for the combined noise 
sources. 



The quantity /* = l/(2nL) is the mean transfer frequency for the LISA arm. 
Using this above formula, we generate instrumental noise from a random Gaussian 
distribution. 

As well as the instrumental noise, the other main source of noise comes from 
the galactic population of binaries. To simulate the galactic foreground we generate 
the response to 26 million individually modelled galactic binaries and superimpose 
the responses [34] . While some of these binaries are bright enough that they will 
be individually resolved, the vast majority will not be. The millions of unresolvable 
galactic binaries which constitute the galactic foreground produce a noise source which 
is commonly referred to as confusion noise. To model the galactic or confusion noise 
we use the following confusion noise estimate derived from a Nelemans, Yungelson, 
Zwart (NYZ) galactic foreground model [3H [JT] 

- 10 -44.6 2/ -2.3 10-4 </< 10-3 



sr f (f) = 



10 -50.9 2/ -4.4 1() -3 < f < 1Q-2.7 
lQ-62.8^-8.8 iQ-2.7 < f < 1Q -2.4 



(30) 



10 -89.68 / -20 iQ-2.4 < f < 1Q -2 

where the confusion noise has units of rn^Hz^ 1 . In our search analysis, we weight our 
scalar products with the total power spectral density 

Sn(f) = Sr(f) + S c n onf (f)- (31) 
In Figure [T] we plot the power spectra of both sources against the instrument noise 
and the galactic foreground of approximately 26 million sources. The solid line in the 
figure is the rms noise of the combined noise sources. We should mention here, that 
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once we introduce the galactic foreground, we automatically lose the assumption that 
the noise is stationary and Gaussian. We will see later in the paper how this effects 
the statistics of the recovered parameter values. 



3.3. The generalized F-Statistic. 

One of the advantages of dividing the parameter set into extrinsic and intrinsic, is 
that we can project onto the orthogonal subspace defined by the intrinsic parameters 
to reduce the dimensionality of the search. Instead of having to search over the 
full 9-d search space, we only need to search the subset of intrinsic parameters 
{ln(Af c ), ln(/x), ln(t c ), 0, <fi}. We can use the F-statistic [42] to find the optimal values 
of the four extrinsic parameters {t, ip, Dl, if c }- This is done as follows. We first write 
the strain of the gravitational wave in the form 

4 



where 



a% = A [(l + cos 2 i) cos 2ip cosf c — 2 cos t sin 2ip sin y> c ] 
a 2 = — A [ (l + cos 2 t) sin 2%l> cos ip c + 2 cos t cos 2ip sin y> c ] 
a 3 = A [(l + cos 2 l) cos 2ip sin <p c + 2 cos i sin 2tp cos ip e ] 
04 = — A [ (l + cos 2 i) sin 2\p sin ip c — 2 cos t cos 2ip cos 95 c ] , 



(33) 



and 



A 1 = m r] x(t) D + cos((p) 
A 2 = m r\ x(t) D x cos(ip) 
A 3 = m rjx(t) D + sin(^) 

A 4 = m nx(t)D x sm(ip). (34) 

Here A = c/Dl and m = Gm/c 3 is the total mass in seconds. We can see that the 
time-independent a, quantities give us a series of four equations in four unknowns. By 
defining four constants N l = (s \A l ) , where sit) = h(t) + n(t) is the signal, we can 
find a solution for the a^'s in the form 

ai = MijN j , (35) 

where the M-Matrix is defined by 

My = (Afiy 1 = (A 1 {A 1 )' 1 . (36) 



If we now substitute the above equation into the expression for the reduced log- 
likelihood, i.e. 

ln£(x) = (s\h(x)) -l(h(x)\h(x)) , (37) 



we obtain the F-statistic 
1 

which automatically maximizes the log-likelihood over the extrinsic parameters and 
reduces the search space to the sub-space of intrinsic parameters. Once we have the 



F=-M ij N i W, (38) 
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numerical aj's, we can set about finding the maximized extrinsic parameters. Denning 
the quantities 

I 2 2 / 2 2 

A + = \J (a 1 - a 4 ) + (a 2 + a 3 ) + y (oi + a 4 ) + (a 2 - a 3 ) , (39) 

/ 2 2/2 2 

A x = y(ai-a 4 ) + (a 2 + a 3 ) - y (ai + a 4 ) + (a 2 - a 3 ) , (40) 

and 

A = A + + ^ + A 2 X , (41) 
we find solutions for the four extrinsic parameters in the form 

l = arccos ^ ^ x ^ , (42) 

1 f-(A +ai + A xai )\ 

tb = - arctan — y — '- , (43) 

2 \- (A x a 2 - A+a 3 ) J ' 1 ; 

^ = $> (44) 

= arctan f Z£^±+A^ll ) , (45 ) 



-c(A+a 2 - ^x^), 

where c = sgn(sin(2'0)). We should note that the F-Statistic is related to the SNR by 
^<ln£) = ^±^, (46) 

where in this case the angular brackets again denote the expectation value and D is 
the dimensionality of the search space. 



4. Metropolis-Hastings sampling with simulated and frequency annealing. 

The Metropolis-Hastings sampling method is a variant on the Markov Chain Monte 
Carlo method (MCMC), and works as follows : starting with the signal s(t) and some 
initial template h(t), we choose a starting point randomly in the parameter space x. 
We then draw from a proposal distribution and propose a jump to another point in 
the space y. In order to compare both points, we evaluate the Metropolis-Hastings 
ratio 

H = K(y)p(s\y)q(x\y) 
n(x)p(s\x)q(y\x) 

Here ir(x) are the priors of the parameters, p(s\x) is the likelihood, and q(x\y) is the 
proposal distribution. This jump is then accepted with probability a = min(l,H), 
otherwise the chain stays at x. 

The above steps are the basis of most MCMC methods. However, as a search 
algorithm on its own, the MCMC method is inefficient and would take too long to find 
the source we are looking for. It also suffers from other limitations. If the source is 
particularly bright and we start too far away from the true parameter values, it is very 
easy for the chain to get stuck on a secondary or tertiary maximum. Once the chain 
gets stuck, it is very difficult for it to move off the local maximum. In order to actually 
convert the MCMC method into a bona fide search algorithm, we need to use simulated 
annealing and a variety of proposal distributions that allow a range of different size 
jumps in the parameter space. We also need to include other non-Markovian methods 
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that we will describe below. It is this combination of Metropolis-Hastings and non- 
Markovian methods that we term Metropolis-Hastings Monte Carlo (MHMC). First of 
all we will describe the various proposal distributions used in our algorithm, followed 
by t c maximization and simulated annealing. 

In order to make small jumps in the parameter space, the most efficient proposal 
distribution to have is a multi-variate Gaussian distribution. The multi-variate jumps 
use a product of normal distributions in each eigendirection of the FIM, T^. Here 
the Latin indices denote that this is the FIM on the 5-D subspace of the intrinsic 
parameters. The standard deviation in each eigendirection is given by <7j = 1/y/DEi. 
Here D is the dimensionality of the search space. As we are using the F-Statistic in 
our search phase, D — 5 and Ei is the corresponding eigenvalue of Yij. The factor of 
1/y/D ensures an average jump of ~ la. 

The medium and large jumps were made from a variation of the same move. 
In order to make a medium jump, we made a uniform draw in the five intrinsic 
parameters of ±10er. The large jumps came from a full range uniform draw on the 
intrinsic parameters, where the parameter range is defined by the priors for the input 
parameters. 

Due to the quadrupole nature of the beam pattern functions of the detector, we 
also found it necessary to include a move that changed sky location by 

e^n-e , 4)^4>±ir. (48) 

By far the most important proposal distribution was a jump in the parameters 
{lnM c , ln/i, lni c }. While investigating the likelihood surface for MBHBs, we found 
that there were "island chains" of maxima running across the surface. In Figure [2] 
we plot an example of these chains in the {lnM c ,lni c } slice through the Likelihood 
surface. Each cell of the plot represents a certain time to coalescence after the 
observation period has finished (we should mention here that for clarity, we set all 
points with a log-Likelihood less than zero equal to zero. In reality, alongside each 
maximum, there is a minimum of equal or greater depth). We can see that if the 
system coalesces a month after we finish observing there is a distinct chain of maxima 
almost bisecting the surface at an angle of 7r/4. We should point out that while the 
islands are symmetric about the major axis of the central ellipsoid, the ellipsoids on 
one side of the center are flipped and rotated with respect to the ellipsoids on the 
other side of the image. As we move through the cells from left to right, top to 
bottom, we notice that as the time of observation approaches the time of coalescence, 
two things happen. Firstly, as expected, the central peak becomes more dominant 
on the Likelihood surface and the number of secondary peaks gets smaller. Secondly, 
the entire chain begins to rotate as we acquire more information, breaking down the 
high correlation between various parameters. As well as moving around the rest of 
the likelihood surface, we found that if we convinced the code to jump from island to 
island, we could greatly speed up the rate of convergence of the algorithm. Therefore, 
our most important proposal distribution was to allow jumps in the eigendirections of 
these three intrinsic parameters according to 

x l -> x l + Ay/Ax 1 (49) 

Here, A is a scaling constant and the small displacement in the eigendirection of each 
parameter, Ax 1 , is given by 
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Figure 2. This Figure displays the evolution for the island chain of maxima on 
a In (M c ) — In (t c ) slice through the Likelihood surface for coalescence times that 
exceed the observation time. 

where once again Ei is the corresponding eigenvalue of the parameter, and Vij are the 
eigenvectors. 

4-1- Simulated annealing and t c maximization. 

As already stated, in a blind search, as we do not know where the actual signal lies, we 
could spend an eternity searching the likelihood surface. To ensure a more acceptable 
movement in the chain, we use simulated annealing to heat the likelihood surface. 
This smooths any bumps on the likelihood surface and spreads the width of the signal 
peak over a wider range. This gives us a greater chance of moving uphill towards the 
summit of the peak. In the definition of the likelihood, Equation (f2T))) we can see that 
there is a factor of 1/2 in the exponent. We can replace this value with the parameter 
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(3, such that (3 is given by 

f ilO"^ 1 -*) 0<i<T c 
(3=1 , (51) 

I 5 «>T C 
where £ is the heat-index defining the initial heat, i is the number of steps in the chain 
and T c is the cooling schedule. 

We should point out that one of the downsides of simulated annealing is that the 
initial heat factor and cooling schedule are dependent on the type of source we are 
looking for. While the initial heat should be chosen high enough that the chain makes 
full range jumps in the parameters, it is harder to tell when we are overheating the 
surface. We have found, by running many different chains, that the safest and easiest 
way to run the search is to chose a constant high value of (3 for all searches. However, 
this is wasteful, as low SNR sources do not require the same amount of heat as high 
SNR sources. Generally we found that a cooling schedule of 10 4 steps was more than 
sufficient if the initial heat had been chosen correctly. It is important that the cooling 
schedule is not too short. Even starting with a high heat, we found that if we cooled 
the chain too quickly it was likely to get stuck at a local minimum for a long time and 
not properly explore the parameter space. 

The second measure we use to accelerate the convergence of the algorithm is to 
maximize over the time-of-coalescence, t Cl during the cooling phase of the chain. We 
have found that this parameter is the most important one to find. Once we have t Cl 
due to the fact that the parameters are highly correlated, the mass parameters are 
usually found very soon after. Strictly, this is not a correct step to use as it assumes 
that LISA is stationary. However, during the annealing phase we treat t c as being 
a quasi-extrinsic parameter and search over it separately. Once the cooling phase 
has finished we stop this maximization. The advantage of including this step in the 
search is that it manages to tie t c down to a restricted search range very quickly. Our 
t c maximization is carried out using a modified F-Statistic search. Using the usual 
Fourier domain t c maximization, the equations for the F-Statistic take on a slightly 
different form. This time instead of defining the four constants iVj, wc define the 
matrix 

n 4 

^' = 4 EE(^+^/)' (52) 

i=l j=l 

where n is the number of elements in the waveform array. The above equation describes 
the four constants at different time lags. We can now solve for the time independent 
amplitudes 

n 4 4 
i=l j=l k=l 

where the M-Matrix is the same as the one defined previously. Inverting the M-Matrix, 
the F-Statistic is now a vector over different time lags. 

n 4 4 

^ = EEE M ^ jArife ( 54 ) 
i=i j=i k=i 

We then search through this vector array to find the value of t c that maximizes the 
F-Statistic. While it is possible to carry out a search without this maximization, it 
takes much longer to converge. In fact, with t c maximization the algorithm has found 
the time to coalescence of the MBHB in as little as 10 steps of the chain. 
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4-2. Frequency annealing. 

Due to limited computing power and the fact that we need to run the searches more 
than once to be sure we have found the correct source, we felt it necessary to improve 
the speed of the search algorithm. For very high mass systems our simulated annealing 
code runs very quickly due to short waveforms. However, as we start to look to lower 
mass systems, the computation time increases. Therefore, instead of dealing with 
the entire search bandwidth, we choose to start off with a small subset of the data 
and increase the frequency range of the subset as the search progresses. The main 
advantage of this method is that the initial part of the search is extremely fast as the 
array sizes are very small. We have found in our runs that we usually have found the 
source by time we get to the costly part of the run. This has the added advantage 
that the computationally expensive part of the algorithm is now more of a parameter 
refinement algorithm than a search algorithm. 

The algorithm works as follows: we define a maximum search bandwidth 
frequency, f m ax, which is a function of the lowest total mass in the priors. We also 
define a bandwidth cut-off frequency, f cut . The initial upper cut-off frequency is chosen 
to be a multiple (at least 2, but in most cases 4) of the lower frequency cutoff of LISA. 
Defining a growth parameter 

B = log ( lap-) a > 2, (55) 

V a J cut J 

we evolve the upper cut-off frequency according to 

10 ~ V ~) f f < f 

r ±XJ J max J ^ Jmax (rn\ 

fcut = < , (56) 

fmax f 2^ fmax 

where i is the number of steps in the chain and Nf is the total number of steps in 
the frequency annealing chain. One of the main advantages of frequency annealing is 
that it acts as a kind of simulated annealing. While the initial search is moderately 
accelerated by including simulated annealing, on the searches we have run, there is 
no difference in the rate of convergence by having the extra annealing included. We 
therefore rid ourselves of having to make the best educated guess at what the initial 
heat factor should be. 

The final refinement we use is to introduce a thermostated heat factor. As the 
frequency annealing is a progressive algorithm (i.e. it does not see the full signal until 
very late in the run) we do not want it to get stuck on a secondary maximum. To get 
around this we use a thermostated heat of 

r 1.0 < SNR < X 

S = \ , ,2 > (57) 

( (^P) SNR> X 

which ensures that once we attain a SNR of greater than x, the effective SNR 
never exceeds this value. This thermostated heat means that the chain explores the 
parameter space more aggressively and as we get towards using the full template, there 
is enough heat in the system to prevent the chain from getting stuck. We found in 
this analysis that it is sufficient to have \ — 20. In this work, we ran the frequency 
annealing for 20,000 steps of the chain. At the end of this search phase we take the 
current value of the thermostated head 5 and cool to a heat of 5 — 0.01 over another 
10,000 steps to find the Maximum Likelihood Estimates for the parameters. 
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Figure 3. A plot of the search chains for the case where we see the coalescence of 
the wave using simulated annealing. The solid horizontal line in each cell denotes 
the true value of the parameters. 



5. The Search for MBHBs. 



We can divide the types of sources we are looking for into two distinct groups : Those 
where we see coalescence and those where we do not. Each case presents its own 
complications and must be treated slightly differently, especially when we come to 
mapping out the posterior distributions of the parameters and estimating the errors 
involved. Due to this division, we treat each source type in turn. 

We should mention that Case A (Table Q]) belongs to a class of astrophysically 
interesting sources. There is a lot of evidence to suggest that modern galaxies formed 
from the coalescence of "seed" galaxies in the distant universe. In order to investigate 
whether or not LISA will be able to observe such sources, we have chosen a seed galaxy 
containing a MBHB with expected low masses at a high redshift of z = 10. 



5.1. Looking for signals where coalescence is observed. 

We produced simulated LISA data streams using the parameter values for Case A 
from Table [1] and ran a search over N = 2 x 10 4 steps using both simulated and 
frequency annealing based searches. In the case of the simulated annealing we chose 
a heat-index of £ = 2 giving an initial heat of 100, while the frequency annealing 
started with f sn i P — 4 x 10~ 5 Hz. In both cases we chose a maximum bandwidth 
of fmax — 2/^aa;(7M), where /^j,(7M) is the maximum frequency of the injected 
signal at 7M. We then sampled with a cadence of r samp — 2f max for the simulated 
annealing. For the frequency annealing, while the input signals are generated at the 
above cadence, the templates are generated at a cadence of 2f snip while f sn %p < fmax- 
When fsnip = fmax, we once again sample at r samp = 2f max . At the end of the search 
phase, we cool the chains to a heat of 0.01 over another 10 4 steps. This gives a reliable 




Figure 4. A plot of the search chains for the case where we see the coalescence 
of the wave using a frequency annealing search. The solid horizontal line in each 
cell denotes the true value of the parameters. 



value of the Maximum Likelihood Estimate (MLE). 

In Figures [3] and [4] we have plotted the search chains for the frequency and 
simulated annealing methods. We should point out that because of the nature of 
the source, the algorithm does not find the sky positions until very late in the chain. 
It is because of this that we have plotted the chains for 9 and i/jona linear-linear scale, 
while all others are plotted on a log-linear scale. In each cell, the solid horizontal line 
represents the true values of the parameters. 

While both methods found the injected source, we can see from the figures that 
there are differences in the behaviour of each algorithm. For example, we see a typical 
evolution in terms of the parameter convergence for the time to coalescence and both 
mass parameters. With simulated annealing, the correct value of t c has been found 
in about 100 steps in the chain. The two mass parameters then lock in in under a 
thousand steps. We see that although the sky locations do not lock in until late in 
the search, we have recovered most of the SNR after 2000 steps in the chain. On the 
other hand, it takes the frequency annealing almost 4000 steps before it closes in on 
t c , M c and /i. This is due to the controlled growth in the available SNR that occurs 
when using frequency annealing. 

In Table [2] we quote the MLEs at the end of the freezing phase for the frequency 
annealing chain (the values from the simulated annealing chain are almost identical), 
the standard deviation as predicted by the FIM at the injected parameter values 
and how close, in terms of multiples of the FIM standard deviations, the recovered 
values are to the injected values. The retrieved values give a normalized overlap 
of O = 0.9955. We should comment here on how well we retrieved the parameter 
values of the injected signal. We mentioned earlier that we would expect that the 
parameter errors will have a Gaussian distribution around the correct value as given 
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3.684932 x 10 6 


3.686128 x 10 6 


2.2255 x 10 3 


-0.53 


(M/M e 


2.0625 x 10 6 


2.06342 x 10 6 


1.59 x 10 4 


-0.058 


e/rad 


0.6842 


0.688477 


4.2397 x 10~ 2 


-0.1 


4>/rad 


2.5791 


2.59448 


6.0353 x 10~ 2 


-0.26 


tc/yr 


0.458333 


0.4583297 


7.429 x 10~ 6 


0.49 



Table 2. This table compares the injected parameter values, the MLE for each 
parameter, the la error predicted by the FIM at the injected values and the 
closeness of the MLEs to the injected values in multiples of the FIM la error 
estimate for the source at z = 10. 



by the FIM. If this was true, we would expect only 67% of our answers to lie within 
la. We attribute the better than expected performance to the anisotropic nature 
of the galactic foreground. The Fisher matrix analysis uses a sky averaged estimate 
for the galactic confusion noise, while the simulated galactic foreground is highly 
localized. MBHBs that are away from the galactic plane are less affected by the 
galactic foreground, and it is possible to beat the all-sky Fisher matrix error estimates. 

It is also interesting to point out that there is a degeneracy in the solutions for 
the sky location. In the LFA, the antennae response for LISA is almost perfectly 
quadrupole. In the next section, where we look for a source where we do not see 
coalescence, we will include a chain that ended up on this degenerate solution. We 
have found that when this happens, the difference in SNR between the two solutions 
is minimal. In fact, for this particular case, one of the chains that found the alternate 
sky solution had a normalized overlap value of O — 0.9954, which is almost identical 
to the above solution. In practice, this means we cannot call any sky solution that is 
out by 7r — 9 in or ±7r in </> a wrong answer. In fact these solutions are just as valid as 
the "real" values. When we have run the chains multiple times with different starting 
seeds we observe that the chains quite happily interchange between the antipodal sky 
solutions. 

Using the recovered parameters we plot the residuals of the signal, i.e. \hi n j— h rec \, 
in Figure [5l We also plot the residuals from a run where we found the degenerate sky 
solution. We can see from the plot that besides a slight excess of power at high 
frequencies, there is not much difference in the two solutions. 

The main advantage of using the frequency annealing algorithm, is how quickly 
it locates the source as compared to using simulated annealing with the full signal. In 
general the frequency annealing algorithm found the source in the space of minutes, 
while the simulated annealing algorithm took hours. To compare the full run times 
(initial detection and parameter refinement), the codes were run on an AMD Athlon 
2700+ with 1GB of memory. For the case in question, the computing time was ~ 7.6 
hours to run the simulated annealing code and ~ 2.7 hours to run the frequency 
annealing code. 
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Figure 5. A plot of the residuals for the source at z = 1.5. The cell on the left 
shows the residual when we find the correct position on the sky. The cell on the 
right shows the residual when we find the secondary solution on the opposite side 
of the sky. As a point of reference, we have also included the rms noise level for 
the combined instrument and galactic foreground noise. 

5. 2. Looking for signals where coalescence is not observed. 

It turns out that the most interesting case is when we do not see the coalescence, 
which corresponds to LISA providing an early warning to the rest of the astronomical 
community. The main problem for searching for signals with T b s < t c is that the 
phase at coalescence is poorly determined. The search chains for these type of systems 
reveal that ip c conducted virtually full range jumps over the length of the chain. The 
indeterminacy of ip c has little effect on the search phase, but it does make it difficult 
to conduct a full 9-D exploration of the parameter posterior distribution function. 
Using the values for Case B in Table Q] we will treat both the search and posterior 
exploration phases in turn. 

In Figures [6] and [3 we have again plotted the frequency and simulated annealing 
search chains. We can see that both algorithms again find the injected source, however 
this time we have plotted a simulated annealing chain where we find the degenerate 
sky solution. We can see from the bottom right cell in Figure[6]that there is no obvious 
reduction in SNR as compared to the frequency annealing search. For the simulated 
annealing search, we started with a heat-index of £ — 1.7 corresponding to an initial 
heat of almost 50. The frequency annealing was once again started at 4 x 10~ 5 Hz. 
We once again see that the important parameter to determine is t c . In the simulated 
annealing case, the chain has closed in on the correct vicinity of t c in about 10 steps 
of the chain and locks in properly after roughly 300 steps. We also see that the chirp 
mass is also determined in about 300 steps. It is interesting to see for the simulated 
annealing chain that as the chain locks in on the alternate sky position, it prevents 
the reduced mass from being more refined. We can also see that due to the sky flips 
the chain does actually visit the correct sky location, but spends more of its time on 
the secondary solution. 

In contrast, the frequency annealing chain takes longer to lock in on the true values 




Figure 6. A plot of the search chains for the case where we do not see the 
coalescence of the wave using simulated annealing. The solid horizontal line in 
each cell denotes the true value of the parameters. Note that in this case the 
chains find the alternative sky solutions. 



(~2000 steps of the chain). This is not surprising in this case. As we previously said, 
we chose the initial frequency cut-off to be 4 x 10 -5 Hz. For this signal, its initial 
frequency when it enters the LISA bandwidth is 4.33 x 10~ 5 Hz. Therefore it takes 
a few hundred steps before the source even enters into the search, and then another 
couple of thousand steps before it accumulates enough enough SNR to resolve the 
key search parameters. Note, that the first few thousand steps of the code where the 
source has been found takes only a few minutes to run. 

In Table [3] we quote the MLEs at the end of the freezing phase, the standard 
deviations as is predicted by the FIM and the closeness of the parameter fit in 
multiples of the FIM prediction. These parameters, which are from the frequency 
annealing chain which found all of the parameters correctly, give a normalized overlap 
of O — 0.99985. The MLEs from the simulated annealing chain that found the 
alternative sky solution gave an overlap of 0.9989, showing once again that it is 
virtually impossible to discern between the two solutions. 

6. Mapping out the posterior distribution functions using an adapted 
MCMC algorithm. 

One of the problems for the MCMC method is that very strong correlations between 
parameters, can be detrimental to good mixing of the chain. In the case where we do 
see the coalescence, the correlations between the parameters is sufficiently mild that 
we can run a standard MCMC chain to explore the posterior distributions over the 
entire 9-D parameter space [TT| . 

However, the posterior exploration is more difficult prior to seeing the MBHB 
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Figure 7. A plot of the search chains for the case where we do not see the 
coalescence of the wave using a frequency annealing search. The solid horizontal 
line in each cell denotes the true value of the parameters. 
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1.635726 


9.52755 x lO- 3 
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<j)/rad 


4.1226 
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0.43 


tc/yr 


1.04 


1.0399988 


3.053062 x 10~ 5 


0.037 



Table 3. This table compares the injected parameter values, the MLE for each 
parameter, the lcr error predicted by the FIM at the injected values and the 
closeness of the MLEs to the injected values in multiples of the FIM lcr error 
estimate for the source at z = 1.5. 



merger. In Figure [9] we have plotted a ln(/i) — ip c slice through the Likelihood surface 
for the case where do not see the coalescence of the MBHB. We see that there 
is an extremely narrow extended ridge on the log-Likelihood surface. To properly 
understand what is going on, we need to imagine the image as being an unrolled 
cylinder. We can see that this ridge wraps itself around the cylinder multiple times. 
In order to explore the posterior distributions properly, we need to travel up and down 
this ridge many times. In fact, the further we are out from coalescence, the greater 
the number of times the ellipsoid wraps itself around the cylinder. We can also see 
from the image the danger of either neglecting or fixing a constant value for (p c . Doing 
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Figure 8. A plot of the residuals for the source at z = 10. The cell on the left 
shows the residual when we find the correct position on the sky. The cell on the 
right shows the residual when we find the secondary solution on the opposite side 
of the sky. As a point of reference, we have also included the rms noise level for 
the combined instrument and galactic foreground noise. 
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Figure 9. A plot showing the elongated ellipsoid around the true values of ln(/i) 
and (p c for the case where we do not see the coalescence of the binary. If we think 
of the figure as representing an unfolded cylinder, we can see that the ellipsoid 
wraps itself around the cylinder multiple times. The further we are out from 
coalescence, the greater the number of times the ellipsoid wraps itself around the 
cylinder. 
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Figure 10. A plot of the marginalized PDFs for a straightforward 9-D MCMC 
exploration of the waveform parameters. Note that the means of the chain have 
been subtracted and the values scaled by the square root of the variances of the 
chains. The solid line in each cell denotes the correct value. In the cases where 
there is no solid line, this indicates that the chain has begun a random walk which 
has resulted in the mean of the chain being at a distance of > 4<j from the correct 
answer. 



so would give incorrect errors for the other parameters as the chain will be confined 
to a small portion of the Likelihood surface. 

To explore this further we first ran a full 9-D search using a normal proposal 
distribution to try an map out the posterior distribution functions (PDFs). The 
search was run for 0.5 x 10 6 steps with an initial period of 10 4 steps where the chain 
was run with a gentle simulated annealing with an initial heat of about 20. This initial 
period is to allow the parameters to modify themselves slightly after the transition 
from a 5-D search to a 9-D exploration. We should note that if the starting parameter 
values in the MCMC are slightly away from the correct values (i.e 10's of sigmas), they 
will migrate to better values, but without the simulated annealing it just takes longer 
for this to happen. In Figure fTOl we have plotted the marginalized PDFs against the 
prediction from the FIM at the mean of the chain. The solid line in each cell denotes 
the correct value. In the cases where there is no solid line, this indicates that the 
chain has begun a random walk which has resulted in the mean of the chain being at 
a distance of > 4cr from the correct answer. Note that histograms have had the mean 
values subtracted and the values are then scaled by the square root of the variances 
from the chains. The Markov chains used to produce these histograms had very high 
auto and cross correlations, and we estimated the chains would need to be hundreds 
of times longer to ensure convergence to the correct posterior distribution. This may 
explain the mismatch between the PDFs and the Fisher matrix predictions, though it 
is also possible that the Fisher matrix predictions are unreliable due to the low SNR 
and the highly correlated source parameters. 
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There are a number of methods that could be used to improve the mixing of the 
chains. One approach would be to use adaptive MCMC techniques such as delayed 
rejection, or pilot chains that produce rough maps of the PDF, which can then be used 
to fashion more effective proposal distributions. Yet another approach is to observe 
that the search chains were little affected by the indeterminacy of ip c since the F- 
statistic eliminates this parameter from the search. On the other hand, the F-statistic 
also eliminates three other parameters that might like to include in our PDFs. One 
way around this is to use a mini F-Statistic that only eliminates two parameters, and 
thus allows us to implement a 7-D MCMC. The mini F-Statistic can be derived by 
writing Equation [3U in the form 

h(t) = 4 ± cos($) J F+ + 4 1 sin($)F x , (58) 

where 

A+ = 2G ™ V x(t) (l + cos 2 l) , 

A x = - 4 ^ x(t) cos i. (59) 



Expanding out the detector response in terms of tp and tp c we get 

cos(y? c ) 

D L 
sin(^ c ) 



h(t) = [A+F+cos^) - A X F X sm(^)] 



[A + F + sm(i P )+A x F x cos( i p)]. (60) 

On closer inspection we can see that the square bracket terms in the above expression 
correspond to the responses h(t;(p c — 0) and h(t;ip c — tt/2) respectively, with the 
luminosity distance set to unity. We can then write the mini F-Statistic in the form 

= cos^) ^ = 0, Dl = 1) + ^) h(t - Vc = tt/2, D L = 1) 

2 

^ a k A\ (61) 



fe=l 



where 



cos(vj c ) sin(</j c ) 

ai = -Dl- ' ° 2 = -D^' (62) 

and 

A 1 = h(t; V c = 0, D L = 1) , A 2 = ft(t; <^ c = tt/2, £) l = 1). (63) 

We then repeat the steps from the generalized F-Statistic until we have the numerical 
values for the quantities a&. We then find the maximized values of <p c and Dl using 

tp c — arctan (~~^J > (64) 

D L = [al + alY 112 . (65) 

In Figure QT] we plot the posterior distributions for the case where we apply the 
mini F-Statistic and search only over seven of the nine parameters. The resulting 
Markov chains showed very little in the way of auto correlation, and the improved 
mixing yielded PDFs that are in good agreement with the Fisher matrix predictions 
for the five intrinsic parameters. The Fisher matrix predictions for the PDFs of the 
polarization and inclination angles are less accurate, but still reasonable. 
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Figure 11. A plot of the marginalized PDFs for a 7-D MCMC exploration of 
the waveform parameters where we have maximized over ip c and Dl. Note that 
the means of the chain have been subtracted and the values scaled by the square 
root of the variances of the chains. The solid line in each cell denotes the correct 
value. 



7. Searching for multiple Black Hole Binaries. 

In this section we investigate the extraction of MBHBs when there is more than one 
system in the data stream. Taking the two black hole binaries that we have already 
searched for individually, we injected both signals into the data stream. While the 
optimal method would be to search for both MBHBs at the same time, in this instance 
we carry out the search iteratively. Our plan is to subtract each source we find from 
the data set before commencing the search for the next source. Once again we use 
a frequency annealing method to extract the systems. While in practice one would 
use a global thermostated heat, our main objective here was to repeat the results of 
the individual searches when we had the two MBHBs present. Therefore, we used the 
same thermostated heat factors for the individual sources in the two systems search 
as we did in the single MBHB search. 

The search for the two MBHBs was carried out multiple times with both frequency 
and simulated annealing methods, and in all cases the algorithm extracted the high 
SNR source first. In none of our runs did the algorithm decide to go after the low 
SNR source first. In Figure [P2l we present the search chains for the two MBHB search. 
Once again, the initial search phase was run for 20,000 steps, with an additional 
freezing period of 10,000 steps to extract the MLEs for the parameter values. We can 
see from the search chains that there is almost no difference between the chains for 
the individual searches and the chains for the double MBHB search. We should not 
be very surprised by this as the normalized overlap between the two sources is just 
-0.0203. 

The important question that has not been answered elsewhere is what effect 
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Figure 12. A plot of the search chains for the case where we search for both 
MBHBs using a frequency annealing search. The solid horizontal line in each cell 
denotes the true value of the parameters. 

will subtracting the MBHBs have on the galactic foreground. In Figure Q2] we plot 
the residuals of the noise, i.e. the difference between the original combination of 
instrumental and galactic foreground noise before the injection of the two MBHB 
sources, and the instrumental and galactic foreground noise after we have searched 
for and subtracted the two MBHBs. We can see from the plot that there is minimal 
contamination of the galactic binaries and instrument noise. This implies that we can 
safely extract the MBHB sources from the data without affecting the search for galactic 
binaries. This supposition is currently being tested in an upcoming publication. 

8. Conclusion. 

In this paper we have presented a new algorithm based on frequency annealing called 
Metropolis-Hastings Monte Carlo (MHMC). We have introduced a galactic foreground 
of approximately 26 million individually modelled sources on top of instrument noise 
to give a realistic representation of the noise in the LISA output. We chose our 
sources so that in one instance coalescence was seen and in another it was not, in one 
instance the source was quite close and bright, and in the other the source was quite 
far away and dim. We demonstrated that both frequency and simulated annealing are 
successful in finding MBHBs in the LISA data stream even when no attempt is made 
to subtract the galactic foreground. The main advantage that frequency annealing has 
over simulated annealing is that it is a more robust search algorithm. In terms of real 
time analysis it finds the source faster than simulated annealing due to the fact that 
initially the sampling rates and array sizes are small. Overall, the total runtime for the 
frequency annealing is also much shorter than the run time for simulated annealing 
allowing us to run many chains over again. 
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Figure 13. A plot of the noise residual after the subtraction of the two MBHBs. 

Once we have found the source, the next step is to map out the posterior 
distribution functions for the source parameters. Due to the fact that the error 
ellipsoid in parameter space is so elongated and narrow, it is very difficult to get 
a straightforward MCMC chain to carry out an exploration of the parameter PDFs 
with reasonable acceptance rates and mixing properties. This problem is amplified 
when we do not see the coalescence of the waveform. We showed that good mixing of 
the Markov chains could be achieved by using a mini F-statistic that extremizcd over 
the phase at coalescence and the luminosity distance. 

Finally, we investigated the case where there were two MBHBs in the data stream 
at the same time. Using an iterative frequency annealing search we extracted both 
black hole binaries. The extraction of each binary had no significant effect on the 
remaining binary nor on the noise in each detector channel. 

This work shows that LISA should be able to detect the inspiral of MBHBs out 
to the distant reaches of the universe. Because of the nature of the source, we will 
also be able to detect multiple sources in the same data stream without effecting the 
galactic foreground. 
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